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Repeated Games With Intervention: 
Theory and Apphcations in Communications 

Yuanzhang Xiao, Jaeok Park, and Mihaela van der Schaar 

Abstract 

In communication systems where users share common resources, users' selfish behavior usually 
results in suboptimal resource utilization. There have been extensive works that model communication 
systems with selfish users as one-shot games and propose incentive schemes to achieve Pareto optimal 
action profiles as non-cooperative equilibria. However, in many communication systems, due to strong 
negative externalities among users, the sets of feasible payoffs in one-shot games are nonconvex. Thus, 
it is possible to expand the set of feasible payoffs by having users choose convex combinations of 
different payoffs. In this paper, we propose a repeated game model generalized by intervention. First, 
we use repeated games to convexify the set of feasible payoffs in one-shot games. Second, we combine 
conventional repeated games with intervention, originally proposed for one-shot games, to achieve a 
larger set of equilibrium payoffs and loosen requirements for users' patience to achieve it. We study the 
problem of maximizing a welfare function defined on users' equilibrium payoffs, subject to minimum 
payoff guarantees. Given the optimal equilibrium payoff, we derive the minimum intervention capability 
required and design corresponding equilibrium strategies. The proposed generalized repeated game 
model applies to various communication systems, such as power control and flow control. 

Index Terms 

Repeated games, Intervention, Power control, Flow control 

I. Introduction 

Game theory is a formal framework to model and analyze the interactions of selfish agents. 
It has been used in the literature to study communication networks with selfish agents [1][2]. 
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Most works modeled communication systems as one-shot games, studied the inefficiency of 
noncooperative outcomes, and proposed incentive schemes, such as pricing and auctions [3]- 
[10], to improve the inefficient outcomes towards the Pareto boundary. 

Recently, a new incentive scheme, called "intervention", has been proposed [11], with ap- 
plications to medium access control (MAC) games [12][13] and power control games [14].^ 
In an intervention scheme, the designer places an intervention device, which has a monitoring 
technology to monitor the user behavior and an intervention capability to intervene in their 
interaction, in the system. The intervention device observes a signal about the actions of agents, 
and chooses an intervention action depending on the observed signal. In this way, it can punish 
misbehavior of an agent by exerting intervention following a signal that suggests a deviation. 
One of the advantages of intervention is that the intervention device directly interacts with the 
users in the system, instead of using outside instruments such as monetary payments as pricing 
and auctions do. As a result, intervention can provide more robust incentives in the sense that 
agents cannot avoid intervention. Moreover, in contrast to pricing and auctions, intervention 
requires no knowledge on the users' valuation of the resource usage in some scenarios [14]. 

In some communication systems where users create severe congestion or interference, increas- 
ing a user's payoff requires a significant sacrifice of others' payoffs. This feature is reflected by a 
nonconvex set of feasible payoffs in some systems studied in the aforementioned works [3]-[15] 
using one-shot game models. For example, in one-shot power control games, the set of feasible 
payoffs is nonconvex when the cross channel gains are large [16] [17]. In one-shot MAC games 
based on the collision model, the set of feasible payoffs is also nonconvex, because transmissions 
from multiple users cause packet loss [8] [18]. Moreover, we will see in this paper that the sets 
of feasible payoffs of some one-shot flow control games are also nonconvex. To sum up, the 
sets of feasible payoffs are nonconvex in many communication scenarios, and when the set of 
feasible payoffs is nonconvex, its Pareto boundary can be dominated by a convex combination 
of different payoffs. In one-shot games, such convex combinations cannot be achieved unless a 
public correlation device is used.^ 

'with the same philosophy as intervention, a packet-dropping incentive scheme was proposed for flow control games in [15]. 

^Public correlation devices are used in game theory literature to simpUfy the construction of the payoffs that are convex 
combinations of pure-action payoffs. Such devices may not be available in communication networks. Even if there exist such 
devices, there are additional costs on broadcasting the random signals generated by public correlation devices. 
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Although some works in power control [16] [17] and medium access control [18] proposed 
time-sharing solutions that achieve payoffs beyond the set of feasible payoffs in one-shot games, 
they did not consider users' incentives. Specifically, in a time-sharing protocol, a user may 
transmit at the time slots that are assigned to other users, in order to obtain a higher payoff. Hence, 
it is important to study deviation-proof protocols, which make it in the self-interest of users to 
comply with the protocols. To this end, we use repeated games to model the communication 
scenarios. In a repeated game, a stage game is played repeatedly, and a user's payoff in the 
repeated game is the discounted average or the limit of the mean of the stage-game payoffs. 
Users can choose different actions in the stage games in different periods, resulting in a convex 
combination of different stage-game payoffs as the repeated game payoff. A repeated game 
strategy prescribes what action to take given past observations and thus can be interpreted as 
a protocol. If a repeated game strategy constitutes a subgame perfect equilibrium (SPE), then 
no user can gain from deviation at any occasion. Hence, a SPE strategy can be considered as a 
deviation-proof protocol. 

In this paper, we consider the protocol design problem of maximizing some welfare function 
defined on users' (subgame perfect) equilibrium payoffs, subject to minimum payoff guarantees 
for all users. When we design a protocol in the repeated game framework, there are three 
important considerations, which also motivates us to introduce intervention in a repeated game. 
The first one is the set of equihbrium payoffs, which characterizes payoffs that can be achieved 
at an SPE. Since the designer is optimizing some welfare function, for example, the sum or the 
minimum of all users' payoffs, this set, along with the minimum payoff guarantees, determines 
the feasible set of the optimization problem. Consequently, a larger set of equilibrium payoffs 
can result in higher social welfare. In this paper, we will characterize the set of equilibrium 
payoffs in repeated games with intervention, and show that using intervention can yield a larger 
set of equilibrium payoffs than the corresponding set without intervention. 

An illustration of the promising gain by using repeated games with intervention is shown in 
Fig. 2. We plot the set of equilibrium payoffs in a two-user flow control game^ under different 
incentive schemes. We use the same intervention device (thus the same intervention capability) in 

^Although we use flow control as an example, the qualitative result is true for other scenarios, such as power control and 
medium access control. 
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the games with intervention. We can see that the set of equilibrium payoffs of the repeated game 
with intervention includes the set of equilibrium payoffs in the one-shot game with intervention 
and the set of equilibrium payoffs in the repeated game without intervention. 

The second consideration is the discount factor, which is affected by the user patience and the 
network dynamics. The discount factor represents the rate at which users discount future payoffs; 
a more patient user has a larger discount factor. The discount factor can also model the probability 
of users remaining in the network in each period; a more dynamic network results in a smaller 
discount factor. As we have mentioned above, the designer aims to maximize some welfare 
function on the feasible set determined by the set of equilibrium payoffs and minimum payoff 
guarantees. Given the target payoff that is in the feasible set and maximizes the welfare function, 
there is a minimum requirement on the discount factors to achieve it as an SPE payoff. A lower 
discount factor is desirable in the sense that with a lower requirement, a protocol is effective 
in a wider variety of users and more dynamic networks. In this paper, we will determine the 
minimum requirement on the discount factor to support the target payoff, and show that using 
intervention can lower the minimum requirement compared to the case without intervention. 
Moreover, we obtain a trade-off between the discount factor, the minimum payoff guarantees, 
and the intervention capability. Hence, given a discount factor, we can calculate the minimum 
intervention capability required to support the target payoff. Conversely, given the intervention 
capabiUty available, we can calculate the minimum requirement on the discount factor, and thus 
determine the types of users and networks that can be supported. 

The last consideration is the equilibrium strategy. Given a target payoff and the discount factor, 
we show how to construct a candidate equilibrium strategy, namely the deviation-proof protocol. 
We will also see that intervention can simplify the users' equilibrium strategies, which reduces 
the complexity of users' devices to execute a protocol. 

To the best of our knowledge, our paper is the first one to study repeated games with 
intervention systematically, addressing the above three considerations. The rest of this paper 
is organized as follows. Section II describes a repeated game model generalized by intervention 
and formulates a protocol design problem. In Section III, we characterize the set of equilibrium 
payoffs when the discount factor is close to one, and specify the structure of equilibrium 
strategies. Then we analyze the design problem in details in Section IV. Simulation results 
are presented in Section V. Finally, Section VI concludes the paper. 
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II. Model of Repeated Games With Intervention 

A. The Stage Game With Intervention 

We consider a system with N users. The set of the users is denoted by N = {1,2,..., N}. 
Each user i chooses its action"^ Oj from the set Ai c R'^' for some integer ki > 0. The set of 
action profiles is denoted by ^ = Ilili the action profile of all the users is denoted 

by a = (oi, . . . , otv) e vA. Let a_j be the action profile of all the users other than user i. In 
addition to the N users, there exists an intervention device in the network, indexed by 0. The 
intervention device chooses its action cq from the set Aq C M'''' for some integer > 0. We 
call the set the intervention capability (of the intervention device), because it determines the 
actions that the intervention device can take when it intervenes in the interaction among users. 
The payoffs of the users are determined by the actions of the users and the intervention device, 
and user i's payoff function is denoted by Ui : Aq x A^M.. 

We assume that there exists a null intervention action, denoted by Og G Aq, which corresponds 
to the case where there is no intervention device. We further assume that an intervention action 
can only decrease the payoffs of the users, i.e., Ui{ao, a) < Ui{aQ, a) for all uq e Aq, all a e ^, 
and all i. In other words, intervention can provide only punishment to users, not rewards. 

An important feature of the intervention device is that it does not have its own objective and 
can be programmed in the way that the protocol designer desires. Hence, the Nash equilibrium 
(aQ,a*) of the stage game with intervention is defined by 

Ui{al, a*) > Ui{al, a^, alj, Vi e Af, \lai e A- (1) 

The Nash equilibrium (oq, a*) without intervention is defined similarly by fixing = in (1). 

B. The Repeated Game With Intervention 

In the repeated game, the stage game is played in every period t — 0, 1, 2, At the end 

of period t, all the users and the intervention device observe the action profile at period t, 

denoted by (aQ,a*). That is, we assume perfect monitoring. As a result, the users and the 
intervention device share the same history at each period, and the history at period t is the 

""We consider only pure actions in this paper. Hence, we use "action" to mean "pure action", and use "pure action" only wlien 
we want to emphasize it. 
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collection of all the actions taken before period t. We denote the history at period t > 1 hy 

= (cq, a°; Cq, a^; . . . ; al~^, a*~^). The history at period is set as h'^ = 0. The set of possible 
histories at period t is denoted by and the set of all possible histories by = IJt^o 

The (pure) strategy of user i is a mapping from the set of all possible histories to its action 
set, written as ai : Jif Ai, i & M. User i's action at history /i* is determined hy a\ — ai{h*). 
The joint strategy of all the users is a — {ai, . . . ^gn), and the joint strategy of all the users other 
than user i is (T_i. The joint action at history /i* is a* = (t(/i*). The action of the intervention 
device at history /i* is determined by af, = ao{h^), where (Tq : — > is the intervention rule.^ 
When the intervention rule is constant at Oq, namely ao{h) = ciq for all h E J^, the repeated 
game with intervention reduces to the conventional repeated game without intervention. 

The overall payoff is the normalized sum of discounted payoffs at each period. We assume 
all the users have the same discount factor S e [0, 1). Then the payoff function of user i in the 
repeated game is 

Ui{aQ, cr) = (1 - S) S^Ui{al{ao, (j), a*((7o, a)), (2) 

where {al{ao, a), a*(cro, a)) is the tth-period actions of the intervention device and the users in- 
duced by the intervention rule ctq and the joint strategy a. {al{ao, cr), a*(cro, a)) can be calculated 
recursively as 

(ao((7o, cr), a*((7o, a)) = ((To(ao((To, (t), a°((7o, cr);...; al~^{ao, cr), a*~^((To, cr)), 

(j(ao((To, a), a°((To, a);...; al~^{ao, a), a*"^((To, a))) (3) 

User i's continuation strategy induced by any history /i* e J^f, denoted (Ji\ht, is defined by 
o'i\h^^{h'^) = ai{h'^h'^),yh'^ e J^, where h^h^ is the concatenation of the history followed by 
the history h'^. Similarly, we can define ao\ht for the intervention device. By convention, we 
denote a\ht and cr_j|/jt the strategy profile of all the users and the strategy profile of all the 
users other than user i, induced by /i*, respectively. Then the subgame perfect equilibrium of 
the repeated game is the intervention rule and the strategy profile (ctq, a) that satisfies 

Ui{<Jo\ht,<j\ht) > Ui{ao\ht , a'^l^t , a-i\ht) , for all cr^, for all i G J\f, and for all /i* G Jif. (4) 

^Note that we do not use "intervention strategy" liere, because the intervention device is not a strategic player and just follows 
a rule prescribed by the designer. 
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The subgame perfect equilibrium prescribes a strategy profile from which no user has incentive 
to deviate at any period and at any history. Hence, the equilibrium strategy can be considered 
as a deviation-proof protocol. 

C. Problem Formulation 

There is a protocol designer who chooses an intervention rule ctq and recommends the joint 
strategy a to the users. We assume that the designer knows the structure of the game including 
the number of users, action spaces, and payoff functions. The designer maximizes a welfare 
function defined over the repeated-game payoffs of the users, W{Ui{ao, a), . . . , f/Ar(cro, a)). At 
the maximum of the welfare function, some users may have low payoffs. To avoid this, the 
designer provides a minimum payoff guarantee 7^ for each user i. Hence, we can formally 
define the protocol design problem as follows 

max W{Ui{ao,a),. . . ,UN{cro,a)) (5) 

ao,a 

s.t. ((Jo,(7) is subgame perfect equilibrium, 

Ui{ao,a) > 7i, Vi e J\f. 

Examples of welfare functions are the sum payoff X^^^ Ui, and the absolute fairness minjg^ C/j. 
Note that the first step towards solving the design problem is to characterize the set of equilibrium 
payoffs, which will be the focus of the next section. Once the designer identifies what payoffs are 
achievable as SPE, it can maximize W over the obtained set of payoffs satisfying the constraints. 

ni. The Set of Equilibrium Payoffs And Structures of Equilibrium Strategies 

In this section, we determine the set of equilibrium payoffs for repeated games with in- 
tervention, when the discount factor is sufficiently close to 1. For the protocol designer, it is 
important to recognize which payoffs are achievable at SPE. The set of equilibrium payoffs in 
conventional repeated games, when the discount factor is sufficiently close to 1, is characterized 
by folk theorems. In this section, we adapt conventional folk theorems for repeated games with 
intervention. Our proofs are constructive and thus yield structures of the equilibrium strategies. 
Our results show that intervention can enlarge the set of equilibrium payoffs and enable users 
to use simpler strategies while sufficient conditions for folk theorems are still satisfied. 

To state folk theorems, we need to define pure-action minmax payoffs with intervention. 
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Definition 1 (Pure-action Minmax Payoff With Intervention): User i's pure-action minmax pay- 
off with intervention is defined as 

= min max iij(ao, Oj, a_i). (6) 

We say a payoff v is strictly individually rational, if vi > vf for all i G J\f. Then the set of 
feasible and strictly individually rational payoffs can be written as 

= {v e rj : V, > vr, Vi e X}, (7) 

where Yj, feasible payoffs, defined by 

= CO {v e : 3(ao, a) e x A, s.t. v = u{ao, a)} , (8) 

where coX denotes the convex hull of a set X. 

In the rest of this section, we will prove two folk theorems, depending on if there exists a 
mutual minmax profile. 

A. Games With Mutual Minmax Profiles 

First, we prove the folk theorem for the games that have mutual minmax profiles. Before we 
state the folk theorem, we define the mutual minmax profile for repeated games with intervention. 

Definition 2 (Mutual Minmax Profile): An action profile {clotSl) is a mutual minmax profile 
if it satisfies vf — mina^^a-i maxa. Ui{aQ, ai, a_i) = maxa. Ui{dQ, ai, a_i), Vi e Af. 

Now we can state the minmax folk theorem for repeated games with intervention as follows. 

Proposition 1 (Minmax Folk Theorem): Suppose that there exists a mutual minmax profile 
{do, a). Then for every feasible and strictly individually rational payoff v G 1^^, there exists 
5<1 such that for all 5 G (5, 1), there exists a subgame perfect equilibrium with payoffs v, of 
the repeated game with intervention. 

Proof: See [19, Appendix A]. ■ 

1 ) Structure of the equilibrium strategy: We first briefly describe the structure of the equilib- 
rium strategy. Then we formally present the equilibrium strategy as an automaton [20]. 

Suppose we want to achieve a SPE payoff v. When the discount factor is sufficiently close to 
1, there exists a sequence of action profiles {a^, a''}r=o some integer T > 0, which satisfies 



5^<5^u(aS,a^)=v. (9) 



1-6^ 

T=0 
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Note that T is infinite in general. When T = oo, the sequence {a5,a'^}^o yields the desired 
payoff (1 — (5) • Y1T=Q '^'^u(a5, a'^) = v. When T is finite, repeating the sequence {ag, ai'}^!^ 
forever yields the desired payoff 

(T-1 T-1 \ r. T-1 

T=0 T=0 / T=0 

In the equilibrium strategy, the intervention device and the users start from (clq^sP) at i = 0, 
and follow the sequence {a^, a^}^=i afterwards. If T is finite, they repeat this sequence forever. 
Since the sequence {Sq, a^}^~o is played at the equilibrium, it is also called the equilibrium 
outcome path. When a deviation from user i happens, the intervention device plays a,Q, and the 
other users play a_j. The minmaxing action oq of the intervention device and the minmaxing 
action profile a_i of the users other than user i last for L periods as punishment. After the L 
periods of punishment, the intervention device and the users return to the equilibrium outcome 
path. Any deviation in the L periods of punishment will trigger another new L periods of 
punishment for the deviating user. The equilibrium strategy can be described by the automaton 
in Fig. 3. 

2) Discussions and examples: From Proposition 1, we can see that any payoff that strictly 
dominates the minmax payoff can be achieved as a subgame perfect equilibrium payoff for some 
discount factors in repeated games with intervention. Here the impact of intervention is two-fold. 
First, intervention can decrease the minmax payoff, which enlarges the set of equilibrium payoffs. 
Second, intervention may provide a mutual minmax profile that is a Nash equilibrium for the 
stage game, while the mutual minmax profile in the original stage game without intervention may 
not be a Nash equilibrium. If the mutual minmax profile is a Nash equilibrium, the punishment 
can be playing the mutual minmax profile forever regardless of which user deviated, and the 
users cannot deviate from this severe punishment. On the other hand, for the mutual minmax 
profile that is not a Nash equilibrium, we can use it as punishment for only a finite number of 
periods and use the promise of returning to the equilibrium outcome path to deter users from 
deviating in the punishment phase. The latter punishment is not as strong as the previous one, 
and is more complicated in terms of the associated automaton and the punishment length L to 
be chosen properly. 

Example 1 (Power Control): Now we consider a power control game to illustrate how inter- 
vention enlarges the set of equilibrium payoffs by decreasing the minmax payoff. 
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Consider a network with N users transmitting power in a wireless channel. Use i's action is its 
transmit power Oj G = [0, aj]. The intervention device also transmits power ao E Aq = [0, oq]. 

User i's signal-to-interference-and-noise ratio (SINR) is calculated by -, , ''"^ — where 

hij is the channel gain from user j's transmitter to user i's receiver, and Ui is the noise power 
at user i's receiver. Each user i's stage-game payoff is its throughput [4] [17] [22], namely 

Mi(ao,ai,a_i) = log2 I 1 + 7 , J^""\ , )■ (H) 

y hiouo + l^j^i hijUj + nij 

Note that the payoff function can be an arbitrary increasing function of the SINR without 

changing the following analysis. 

In this power control game, the null intervention action, which corresponds to the case with no 
intervention, is Oq — 0. Without intervention (i.e., ao is fixed at the only Nash equilibrium of 
the stage game is (ag, a) = (0, ai, . . . , a^), where every user transmits at its maximum power. 
Moreover, the Nash equilibrium is the mutual minmax profile with payoff Y° — {vi, . . . fV^) — 
u(0, a). With intervention, the mutual minmax profile (ao,a) is also a Nash equilibrium, with 
payoff v"' = (v^, . . . , v^) = u{d(i, a) < u(0, a). Note that vf < v° Vi, and that each vf reduces 
as flo increases. Hence, the set of equilibrium payoffs when the discount factor is sufficiently 
close to 1 expands as the maximum intervention power ao increases. 

Example 2: Now we consider a flow control game, whose mutual minmax profile without 
intervention may not be a Nash equilibrium. We show that intervention induces a mutual minmax 
profile that is a Nash equilibrium, and thus, enables a more severe punishment and a simpler 
equilibrium strategy. 

Consider a network with N users transmitting packets through a single server, which can be 
modeled as an M/M/1 queue [24]-[26]. User i's action is its transmission rate ai E Ai = [0, a,^. 
The intervention device also transmits packets at the rate of ao e Ao = [0, clq]. User i's payoff 
is a function of its transmission rate a^ and its delay — ao — ^f=i dj) [23]-[26], defined 
by 

Ui{ao, ai, a_i) = af max |o, ji - ao - YljLi %} , (12) 

where > is the server's service rate, and /3i > is the parameter reflecting the trade-off 
between the transmission rate and the delay. Here the "max" function indicates the fact that the 
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payoff is zero when the total arrival rate is larger than the service rate. We assume that the service 
rate is no smaller than the maximum total arrival rate without intervention, i.e., ji > Yl!j=i '^j- 

Without intervention, the mutual minmax profile is every user transmitting at the maximum 
rate, i.e., (0, ai, . . . , cat). Since user i's best response to the action profile a_i is given by 
a* — min (j^ — S^^i > ^i|> the mutual minmax profile is a Nash equilibrium without 

intervention if and only if 

a^<^^{^l-E,^,a,),y^e^^. (13) 

Hence, without intervention, the mutual minmax profile may not be a Nash equilibrium. However, 
with intervention, the mutual minmax profile (qq, Qi, ■ ■ ■ , Qjv) is always a Nash equilibrium, as 
long as the maximum rate oq of the intervention device is high enough to yield 

A* - T^j^i % - ao < 0, Vi e Af. (14) 

To see why we prefer a mutual minmax profile that is an NE, we study the conditions under 
which a strategy, described by the automaton in this subsection, is an SPE. For simplicity, 
consider a special case where T — oo and (a5,a^) = (0, a) Vr in the automaton. When the 
mutual minmax profile is not an NE, the conditions on the discount factor S and the length of 
the punishment phase L can be derived from the proof of Proposition 1 as 

5 + --- + 5^> ^ ^ — ^— ^ ^ ^ . , Vi, (15) 

«f (/^ - E^Li ^j) - «f • max |0, - ao - J2j=i Ojj 

where a* = min ja^, (^(x - J^j^i «j) }' and 

(a*)^^ • max 1 0, // - Oq - a* - % | - • max |o, // - Oq - Eii % | 

^ - 4 N \ ( ^ N ^ ^' ^^^^ 

«f (/^ - Y.f=i % j - «f • max |0, /X - ao - E^i «j| 

where a* — min |ai, ^/x — ao — Ej^j |- F^i" ^^^^ without intervention, we let ao — 
in the above inequahties to get the corresponding conditions on 5 and L. When the mutual 
minmax profile is an NE, only the first inequality needs to be satisfied. 

Fig. 4 shows the minimum discount factor 5 required for the strategy to be an SPE, under 
different punishment lengths L and maximum intervention flow rates (Iq. In this example (system 
parameters shown in the caption of Fig. 4), when the maximum intervention flow rate is 2.5 
bits/s, the mutual minmax profile is an NE. In this case, we do not need to provide the promise 
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of coming back from the punishment phase. In other words, we allow L = oo. In this way, we 
can achieve the minimum discount factor possible. Without intervention or when the maximum 
intervention rate is 0.5 bits/s or 1.0 bits/s, the mutual minmax profile is not an NE. In this 
case, the minimum discount factor increases when the punishment length increases beyond a 
threshold. This is because the users need to be more patient to carry out longer punishment, 
reflected by (16). But note that under the same punishment lengths, the minimum discount factor 
with the maximum intervention flow rate 0.5 bits/s is still smaller than that without intervention. 
This example indicates that the punishment length should be carefully designed when the mutual 
minmax profile is not an NE. 

B. Games With Player-specific Punishments 

Now we study the folk theorem for repeated games with player-specific punishments. The 
definition of player-specific punishment [21, Definition 3.4.1] can be extended to the case with 
intervention as follows. 

Definition 3 (Player-specific punishments with intervention): A payoff v G allows player- 
specific punishment if there exists a collection of payoff profiles {v*}^^, v* G l^*', such that 
for all i, Vi > v\, and for all j i, vj > v]. The collection of payoff profiles {v*}^i is a 
player-specific punishment for v. 

Proposition 2 (Folk Theorem With Player-specific Punishments): Suppose that v G Y^^ al- 
lows player-specific punishments in Y^^. There exists 5_< 1 such that for all 5 G (5, 1), there 
exists a subgame perfect equilibrium with payoffs v, of the repeated game with intervention. 
Proofi See [19, Appendix B]. ■ 

1) Structure of the equilibrium strategy: Suppose we want to achieve a SPE payoff v. 
Again, we denote the equilibrium outcome path by {Sq, a^}^=o some integer T > 0. In the 
equilibrium strategy, the intervention device and the users follow the sequence {a5,a^}^=o, or 
repeat this sequence forever when T is finite. When a deviation from user i happens, the sequence 
{0,^(1) , is played, generating in the player- specific punishment. The automaton of 

this equilibrium strategy is similar to that with mutual minmax profiles, with the only difference 
being the specific punishments for different users in this case. Due to space limit, we omit the 
detailed description of the automaton. 
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2) Discussion and examples: 

Example 3: We modify the flow control game in Example 2 with another form of interven- 
tion, to illustrate that intervention can simplify the player- specific punishment. We consider an 
intervention device that inspects the packets at the output of the server. It can identify the sources 
of the packets and drop packets with certain probabilities. The intervention device's action is 
denoted by a vector ao = (ao,i, ■ ■ ■ , ao,iv), with ao,i e [0, 1] being the probability of dropping 
user i's packets. Then the stage-game payoff of user i with a fixed eiq is 



Such an intervention device can carry out player-specific punishments all by itself. If a 

unilateral deviation occurs, the intervention device will drop the packets of the deviating user 
with probability 1, while it will not drop the other users' packets. 



In Section III, we have characterized the set of equilibrium payoffs when the discount factor 
is sufficiently close to 1. With the set of equilibrium payoffs and minimum payoff guarantees, 
we can obtain the optimal equilibrium payoff {U^, . . . ,U^) that maximizes the welfare function 
while satisfying minimum payoff guarantees. In addition, we have obtained the structure of the 
equilibrium strategy {ao, a) that can achieve any equilibrium payoff when the discount factor is 
sufficiently close to 1. 

In this section, we provide detailed analysis of the protocol design problem under the practical 
condition that the discount factor is bounded away from 1. Specifically, given the optimal 
equilibrium payoff (U*, . . . ,U^), we derive the minimum discount factor under which the 
optimal equilibrium payoff can be achieved by an equilibrium strategy of the structures described 
in Section III, and we construct the corresponding equilibrium strategy. For analytical tractability, 
the analysis is carried out for a special class of games to be specified later. The power control 
and flow control games in Example 2 and Example 3 are special cases of this class of games. 

Note that the derived minimum discount factor is a function of the minimum payoff guarantees 
and the intervention capability. Thus, we obtain the trade-off among the discount factor, the 
minimum payoff guarantees, and the intervention capability. With this trade-off, we can determine 
if we can achieve the optimal equilibrium payoff under a given discount factor, and if not. 




(17) 



IV. Detailed Analysis of The Protocol Design Problem 
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how to change the minimum payoff guarantees such that the optimal equilibrium payoff can be 
achieved. Moreover, we can determine the intervention capability required to achieve the optimal 
equilibrium payoff under a given (proper) discount factor. 

A. A Special Class of Games 

In the rest of the paper, we consider a special class of games that satisfy two assumptions. 

Assumption 1: With intervention, the stage game has a pure-action mutual minmax profile 
(oo, a) (see Definition 2), which is also a Nash equilibrium of the stage game. 

Remark 1: It is very common to have a pure-action mutual minmax profile as a Nash equi- 
librium in resource allocation games, such as the power control, and flow control games in 
Section III. Specifically, if all the users share a common resource, i.e., ki — 1 for all i, and 
they consume as much resources as possible, each one of them will be minmaxed. In the power 
control game in Example 2, the mutual minmax profile is the NE. In the flow control, although 
the mutual minmax profile may not be an NE without intervention, using intervention can make 
it an NE, as we have shown in Section 111. 

Assumption 2: For each user i, there exists an action profile a* such that 

rtj(aQ, a*) = max 14,(03, ^) — '*^j(^0' ^) — 0' W 7^ (1^) 

a 

and that the set of feasible payoffs is 1^ = co {(0, . . . , 0), u(ao, a^), . . . , u(aQ, a^)}. 

Remark 2: This assumption ensures the existence of the action profile a', where user i takes 
the most advantage of the resources while the other users receive no benefit. The assumption 
that Uj{aQ,ai) = 0,Vj 7^ i is natural in resource allocation problems, because the other users 
should consume no resources in order to maximize a specific user's utility, and a user's utility 
should be zero when it consumes no resource. In addition, due to significant interference among 
the users, the set of feasible payoffs is the convex hull of \hQ N +1 points. Assumption 2 holds 
true for the power control and flow control games in Section IE. Note that we can generalize our 
results to the case where %(ao, a) > 0, Vj ^i, but we will not do that for notational simplicity. 

With Assumption 2, we can write the Pareto boundary of the feasible set of the protocol 
design problem explicitly as follows 

-^7 = {v e : E.=iKM) -hVi> 7„ yieU], (19) 
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To ensure that the Pareto boundary is nonempty, not singleton, and composed of individually 
rational payoffs, we impose the following constraints on the minimum payoff guarantees 7 

Eli(7iM) < 1, and i;- < 7i < ^i, Vi. (20) 

B. Trade-off Among Discount Factor, Minimum Payoff Guarantees, and Intervention Capability 

The optimal equilibrium payoff of the protocol design problem, also referred to as the target 
payoff V*, must lie in the Pareto boundary Given the target payoff v* G we determine 
the minimum discount factor, under which v* can be achieved as an SPE payoff by an equilibrium 
strategy of the structure specified in Section III-A. 

In general, it is very difficult to find the minimum discount factor under which a payoff is 
achieved as an SPE payoff; see [27] for a discussion of this topic in repeated prisoners' dilemma. 
Here, we provide an upper bound 5(v*) of the minimum discount factor 5*(v*) to achieve the 
target payoff v* as an SPE payoff. To clearly state the result, we define the maximum stage-game 
payoff that user j can get by deviating from any profile in {(%, a')}^^ as 

Wj — max max Uj ( Oq , a j , • ) . (21) 

Theorem 1: With intervention, the minimum discount factor 5*(v*) to achieve the target payoff 
V* as an SPE payoff is upper bounded by 

Si.n = max Lax 2(£-^^^^_ 1 

^ ^ \ j^i Wj - vf {N-T) + ^{N-Ty + A{T-S){N-l)j 

where T = EliMvi), and 5 = ElM/vi). 

Proof: See [19, Appendix C]. ■ 
Remark 3: First, S{v*) and the minimum payoff guarantees 7 is linked through the target 
payoff V*. Different minimum payoff guarantees result in different target payoffs, which yield 
different 5(v*). Second, S{v*) is an increasing function of the minmax payoff {wf }^^, which 
is determined by the intervention capability Aq. Since vf < v° for all i, applying intervention 
to repeated games lowers the minimum requirement on the discount factor, thus supports less 
patient users in a more dynamic network. 



November 11, 2011 



DRAFT 



16 



C. The Equilibrium Strategy 

With Assumption 1, we can use the equilibrium strategy in which the punishment for deviation 
is playing the mutual minmax profile forever. The automon of this equilibrium strategy is a 
simplification of the one shown in Fig. 3. The set of states is W — Wel^ {wp} with the initial 
state u'e(O), where We — {we{T) : < r < T — 1} is the set of states in the equilibrium outcome 
path, and Wp is the state in the punishment phase. The action profile at each state is specified 
by the output function 

{(a5,a^), ifw = We(T) 
(23) 
(ao,a), liw^Wp 

The state transition is specified by \{wp, (oq, a)) = Wp V(ao, a), and 

{We{T + 1 mod T), if (ao, a) = (aj, a"^) 
(24) 
Wp, otherwise 

With Assumption 2, it is sufficient to choose every action profile (ag, a^) in the equilibrium 
outcome path from the set {{Qlq-, 3l)}^=x- 

Now the only unknown is the users' action profiles in the equilibrium outcome path, i.e., 
{a^}^~Q. There are two requirements for {a^}^~o. First, it results in an equilibrium outcome 
path whose discounted average payoff is the target payoff v* under a given discount factor. 
Second, it results in a strategy, described in the above automaton, that is an SPE under the given 
discount factor. Theorem 2 shows how to design {aJ}^!^ for any target payoff G under 
any discount factor 5 > 5{v*). 

Theorem 2: For any target payoff v* e and any discount factor 5 > 5(v*), the users' 
action profiles in the equilibrium outcome path, {a.'^}^~Q, can be generated by the algorithm in 
Table I. 

Proof: See [19, Appendix D]. ■ 

Remark 4: Note that any action profile a(r) in the sequence is from the set of {a^}fLi. In other 
words, only one user takes nonzero action in each period. This greatly simplifies the monitoring 
burden of the intervention device and the users. Actually, at each period, the inactive users, who 
take zero actions, do not need to monitor, because the active user, who takes nonzero action 
at that period, can sense the interference and report to the intervention device the detection of 
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deviation. Then the intervention device can broadcast the detection of deviation to trigger the 
punishment. In addition, no inactive user can gain from sending false report to the intervention 
device, because the intervention device only trusts the report from the active user. 

V. Simulation Results 

In this section, we consider the flow control game in Example 2 to illustrate the performance 
gain of using intervention in repeated games and the trade-off among the discount factor, the 
minimum payoff guarantees, and the maximum intervention flow rate. 

A. Performance Gain 

First, we compare the performance when the protocol designer solves the protocol design 
problem (5) using four different schemes, namely greedy algorithms [24]-[26], one-shot games 
with incentive schemes [11][15][23], repeated games without intervention, and repeated games 
with intervention. The two example welfare functions we use are the sum payoff Y^^^j^Ui 
and the absolute fairness minjg^v" Ui. Greedy algorithms achieve the NE, which may not satisfy 
the minimum payoff guarantees. For one-shot games with incentive schemes, we assume that 
the entire Pareto boundary of the set of pure-action payoffs can be achieved as NE by using 
appropriate incentive schemes, in order to get the best performance achievable by this scheme. 

1 ) Impact of the number of users: We compare how the performance scales with the number 
of users under the four schemes. For simplicity, we study the symmetric case, where all the users 
have the same throughput-delay trade-off ^ = 3 and the same maximum flow rate normalized to 1 
bits/s. We consider two scenarios: first, the server's capacity (service rate) increases linearly with 
the number of users, i.e. fi = N bits/s, and second, the server's capacity is limited at 10 bits/s, i.e. 
fjL — min{A^, 10} bits/s. The maximum intervention flow rate is clq — max{/x— (A?^— 1), 0} bits/s to 
ensure the mutual minmax profile is an NE. We set the minimum payoff as 10% of the maximum 
stage-game payoff Vi subject to the constraints (20), namely 7^ = min{0.1 -Vi, ji/N, v^} bits/s. 

Fig. 5 shows the sum payoff and the fairness achieved by different schemes. When the capacity 
increases linearly with the number of users A^, both the sum payoff and the fairness increase with 

by using repeated games. In contrast, when using one-shot games with incentive schemes, 
the sum payoff and fairness increase initially when the number of users is small, and decrease 
when > 4. A sum payoff or fairness of the value means that the minimum payoff cannot 
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be guaranteed. This happens in one shot games with incentive schemes when N > 5. The NE 
payoff does not satisfy the minimum payoff guarantee with any number of users. 

When the capacity is limited at 10 bits/s, for repeated games with intervention, the sum payoff 
reaches the bottleneck of 10, and due to congestion, the fairness decreases when > 10. For 
repeated games without intervention, the sum payoff and the fairness decrease more rapidly, and 
the minimum payoff guarantee cannot be met when N > 15. For one-shot games, the trend of 
the sum payoff and fairness is similar to the case with increasing capacity. 

In conclusion, using repeated games with intervention has a large performance gain over the 
other three schemes, in terms of both the sum payoff and the fairness. 

2) Impact of minimum pay ojf guarantees: We compare the performance of the four schemes 
under different minimum payoff guarantees. The system parameters are the same as those in 
Fig. 4. The maximum intervention flow rate is oq = 2.5 bits/s to ensure the mutual minmax profile 
is an NE. We set the same minimum payoff guarantee for all the users, namely 7^ = 7j, Vi, j e M. 

Table II shows the sum payoff and the fairness achieved by the four schemes. We write "N/A" 
when the minimum payoff guarantee cannot be satisfied. For repeated games, we show the 
minimum discount factors allowed to achieve the optimal performance in the parenthesis next 
to the performance metric. An immediate observation is the inefficiency of the NE, as expected. 

When using one-shot game model with incentive schemes, the performance loss compared to 
using repeated games is small when the minimum payoff guarantee is small (7^ = 1). However, 
when the minimum payoff guarantee increases, using one-shot games is far from optimality in 
terms of both the sum payoff and the fairness. Note that using one-shot games fails to satisfy 
the large minimum payoff guarantee (7^ = 14). In summary, using repeated games has large 
performance gain over using one-shot games in most cases, and is necessary when the minimum 
payoff guarantee is large. 

Under the specific parameters in this simulation, if we allow the discount factor to be suffi- 
ciently close to 1, using intervention in repeated games has little performance gain over repeated 
games without intervention, especially when the minimum payoff guarantee is large. This is 
because the minmax payoff without intervention is already small, such that its is Pareto dominated 
by some large minimum payoff guarantees. In this case, the advantage of using intervention 
in repeated games is that it allows smaller discount factors to achieve the same or better 
performance, compared to using repeated games without intervention. This is also confirmed 
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by Fig. 6, which shows the minimum discount factor allowed to achieve the target payoff 
that maximizes the sum payoff, under different minimum payoffs 7 and different maximum 
intervention flow rates. 

B. Trade-off among S, 7, and ao 

Consider the same system as that in Fig. 4 and Fig. 6. Suppose that the target payoff is the one 
that maximizes the sum payoff. In Fig. 7, we plot the trade-off between the required maximum 
intervention flow rate and the discount factor under different minimum payoff guarantees. In 
Fig. 8, we plot the trade-off between the required maximum intervention rate and the minimum 
payoff guarantees under different discount factors. The protocol designer can use these trade-off 
curves as guidelines for determining the maximum intervention flow rate required under different 
discount factors and minimum payoff guarantees. 

VI. Conclusion 

In this paper, we proposed a repeated game model generalized by intervention, which can 
be applied to a large variety of communication systems. We use repeated games to achieve the 
equilibrium payoffs that Pareto dominate some payoffs on the Pareto boundary of the nonconvex 
set of feasible payoffs in one-shot games. In addition, we combine conventional repeated games 
with intervention, and show that intervention enlarges the set of equilibrium payoffs when the 
discount factor is sufficiently close to 1. Then we consider the protocol design problem of 
maximizing the welfare function subject to minimum payoff guarantees. We derive the trade-off 
between the discount factor, the minimum payoff guarantees, and the intervention capability, 
and construct equilibrium strategies to achieve any target payoff as an SPE payoff. Simulation 
results show the great performance gain, in terms of sum payoff and absolute fairness, by using 
intervention in repeated games. 

Appendix A 
Proof of Proposition 1 

We prove the minmax folk theorem by constructing an equilibrium strategy profile presented 
in the automaton in Fig. 3. Assume that the game has a mutual minmax profile (oq, a). First, we 
prove that any pure-action payoff profile u(ao, a) that Pareto dominates the minmax payoff profile 
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Fig. 1. The automaton of the equihbrium strategy of the game with a mutual minmax profile (ao,a). Circles are states, where 
{TOe(r)}^~Q^ is the set of states in the equilibrium outcome path, and {wp{£)}^_^ is the set of states in the punishment phase 
for user i. The initial state is TOe(O). Solid arrows are the prescribed state transitions labeled by the action profiles leading to 
the transitions. Dashed arrows are the state transitions when deviation happens, a* is user i's best response to clq and a_i. 



u(ao, a) can be achieved as an SPE payoff. Then, we prove that any payoff profile v G i^J,^, 
which may not be achieved by a pure-action profile, can be achieved as an SPE payoff. 
For the reader's convenience, we rewrite the formal description of the automaton here: 
. The set of states is = U U ■ ■ ■ U Wj^ , where We = {we(r) : < r < T - 1} is 
the set of states in the equilibrium outcome path, and = {wp{i) :0<£<L — l} is 
the set of states in the punishment phase for user i. The initial state is = We{0). 
• The action profile at each state is specified by the following output function 

{(a5,a^), iiw = We{T) 
(25) 
(ao,a*,a_,), w e 
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A(we(r), (ao,a)) 



where a* is user i's best response to clq and a_i. 
• The state transition is specified by the following transition rule 

Wp{0), if Qi ^ dj and a_i = aLj 

We{T + 1 mod T), otherwise 

and 

f 

■u;p(£+ 1), \i l<L- \ and (ao,a) = (ao,a*,a_i) 
if £ = L and (oq, a) = (oq, a*, a_i) 
if 7^ a* and a_i = a^^ 

if j ^ i, Qj ^ dj, Qi = a*, and at = Sjt VA; 7^ i, j 



(26) 



A(w;(£),(ao,a)) = < 



We(0), 



.(27) 



A. Achieving Pure-action Payoff As an SPE Payoff 

We prove that for any pure-action payoff profile u(ao,a) that Pareto dominates the minmax 
payoff profile u(ao, a), there exists 5 < 1 and a strategy profile presented in the automaton in 
Fig. 3, such that for all 6 G (5, 1), the strategy profile is a subgame perfect equilibrium with 
payoff u(ao, a). 

Since the payoff can be achieved by a pure-action profile (ao,a), the equilibrium outcome 
path is repeating (oq, a) in every period. Hence, the automaton is simplified to have T — 1 and 

(ag,a°) = {do, a). 

Now we calculate the values of all the states in the automaton. For u'e(O), we have 

VK(0)) = (1 - (5) • u(ao,a) + 6 ■ VK(0)) ^ VK(0)) = u(ao, a). 



For wi{£), < £ < L - 1, we have 

V(wi(e)) = (l-5)-u(ao,a*,a_0 + 5-V(«;;(£+l)) 

1 - 5^-^-1 



[1-6)- 



u(ao,a.,a_i) 



1-S 



+ d 



L-e-i 



VK(L-l)) 



Since we can calculate 'V{Wp{L — 1)) as 

y{w;{L - 1)) = (1 - 5) • u(ao, a*, a_i) + 6 ■ VK(0)), 
we have for £ = 0, . . . , L — 1, 

V{w;{e)) = (1 - 5^-0 • u(ao,a*,a_i) + 5^"^ • u(ao,a). 



(28) 

(29) 
(30) 

(31) 

(32) 
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With the values of the states, we can derive the conditions under which the strategy prescribed 
by this automaton is a subgame perfect equilibrium. To this end, we need to check that for any 

state w accessible from w'^, f{^v) is a Nash equilibrium of the normal-form game described by 
the payoff function g'^ : A ^ M^, where 

^"'(a) = (1 - 5) • u(ao(«;),a) + 5 ■ V(A(«;, aoH, a)), (33) 

where ao{w) is the intervention action prescribed in f{w). 

First, we check the incentive compatibility constraints for users to follow the action profile a. 
This can be done by checking if a is a Nash equilibrium of the normal-form game at the state 
w — We{0). The normal-form game at state w — We{0) has payoff function 

{{1 — 6) ■ u(ao, a.) + S ■ V(we(0)) = u(ao, a), if a = a 
■ (34) 
(1 - 5) • u(ao, a) + 5 • V(w^(0)), if Oj 7^ Oj and a_j = a_j 

The action profile a is a Nash equiUbrium if and only if, for alH e jV' and for all ai E Ai, 

Ui{ao, a) > (1 - (5) • Ui{ao, Oj, a_i) + 6 ■ [(1 - S^) ■ Ui{do, a.) + ■ Ui{ao, a)] . (35) 
Define M = maxj aitj(ao, a). Then it suffices to have, for all i e N", 

(1 - 5^+^) • Ui{ao, a) > (1 - 5) • M + (5 • (1 - 5^) • Ui{ao, a). (36) 
After rearranging the terms, we have 

((5 H \-5^)- {ui{do, a) - Ui{ao, a)) > M - Ui{do, a). (37) 

Second, we check the incentive compatibility constraints for staying in the punishment phase. 
This is done by checking if a is a Nash equilibrium of the normal-form games at the states 
Wp(0), . . . ,Wp{L — 1). Actually, we only need the incentive compatibility constraint to hold for 
the game at state Wp(0), because the value of state Wp(0) is the lowest among all the states in 
the punishment phase, and user i's deviation to in any state in the punishment phase results 
in the same payoff of 

(1 - 5) • Ui{ao, tti, a_i) + d ■ ViiwiiO)). (38) 

Hence, for the punishing action profile a to be incentive compatible, we need for all i e Af and 
for all Oj e Ai, 

Vi{w;{0)) > {1-S)-Ui{do,ai,§i_i) + S-Vi{w;{0)), (39) 
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which gives us for dXl i & M and for all ai & Ai, 

Vi{wi{Q)) = (1 - 5^) ■ Ui{ao, a) + (5^ • Ui{ao, a) > Ui{ao, a^, a_i). (40) 

Since Mi(ao, Oj, a_i) < , it suffices to have for all i e M, 

(1 - 5^) • Ui{ao, a) + 5^ • iii(ao, a) > vf. (41) 

Now we have two sets of constraints (37) and (41) for the discount factor 6 and the length 
of punishment L. We choose L such that L ■ {ui{ao,a.) — Ui{ao,a.)) > M — iij(ao,a) for all i. 
Then we can find a discount factor < 1 such that (37) and (41) are satisfied for all i. 

B. Achieving Any Feasible and Individually Rational Payoff 

We prove that any feasible and individually rational payoff v G i^J,^ can be achieved as an 
SPE payoff. The difficulty is that the payoff starting from a certain period may be too low to 
prevent users from deviating at that period. This difficulty is resolved by [21, Lemma 3.7.2], 
which states that for any £ > 0, there exists S' < 1 such that for any payoff v e and any 
discount factor S e (^',1), there exists a sequence of pure action profiles that gives discounted 
average payoff v and that has continuation payoffs within £ of v at any period t. In other words, 
the payoff starting from any period is approximately the same, as if generated by a pure-action 
profile. Combining with the results in the previous subsection, there exists 5 = max{5^, 5'}, 
such that any discount factor 5 e (5, 1) can sustain v e 'fj,^ as an SPE payoff. 

Appendix B 
Proof of Proposition 2 

We prove the folk theorem with player-specific punishments by constructing an equilibrium 
strategy profile. First, we prove that any pure-action payoff profile v = u(ao, a) that allows 
pure-action player-specific punishments {v* = u{dQ{i), SL{i))}f^^ can be achieved as an SPE 
payoff. Then, we prove that any payoff profile v e Yj,^ that allows player-specific punishments, 
which may not be achieved by pure-action profiles, can be achieved as an SPE payoff. 

For the reader's convenience, we write the formal description of the automaton (when using 
pure-action player-specific punishments) here: 
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. The set of states is >r = U U • • • U , where = {weij) : < r < T - 1} is 
the set of states in the equihbrium outcome path, and = {wp{i) : < i < Lj is the set 
of states in the punishment phase for user i. The initial state is = We{0). 

• The action profile at each state is specified by the following output function 

(55, a""), a Weir) 
f{w) = I {al a*, aLj, if w = w^i) and £ < L , (42) 
(ao(i),a(i)), if w = wl{L) 
where (oq, a*, a*_j) is the action profile that minmaxes user i, namely 



min maxWj(ao, Oi, a_j) = Ui(aQ, a*, a!_j) = vf. 

ao,a-i ai 



(43) 



A(we(T), (ao,a)) 



(44) 



A(w;(£),(ao,a)) 



• The state transition is specified by the following transition rule 

w^(0), if ai ^ aj and a_i = al^ 

We(T + 1 mod T), otherwise 

and 

(£ +1), if £ < L and (ao, a) = (a^, a*, a^J 
u'p(L), if £ = L and (oq, a) = (ao(i), a(i)) 
■u;p(0), if Oj 7^ a*, a_i = a!_j, and £ < L 
w^(0), if j 7^ % 7^ a}, otj = a*, ctfc = VA; 7^ £< L 
wjj{0), if 7^ aj(i) and a_j = a_j(i), and i = L 

A. Achieving Pure-action Payojf As an SPE Payojf 

We prove that for any pure-action payoff profile v — u(ao, a) that allows pure-action player- 
specific punishments {v* = u(ao(i), a(i))}^i, there exists S<1 and a strategy profile described 
in the above automaton, such that for all 5 e (^,1), the strategy profile is a subgame perfect 
equilibrium with payoff u(ao, a). 

Since the payoff can be achieved by pure-action profile (So, a), the equilibrium outcome path 
is repeating (ao, a) in every period. Hence, the automaton is simplified to have T = 1 and 
(ag,a°) = (ao,a). 
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Now we calculate the values of all the states in the automaton. For We{0), we have 
VK(0)) = (1 - (5) • u(ao, 3l) + S- VK(0)) ^ VK(0)) = u(ao, a). 
For Wp{L), we have 

V(w;(L)) = (1 - 5) • u(ao(i), a(i)) + 5 ■ Y(w;(L)) ^ V(wi(L)) = u(ao(i), a(i)). 
For < £ < L - 1, we have 



Viwiii)) = (l-5).u(a^,a*,a^) + 5-VK(£ + l)) 

u(a^,a.,a!_J 



= (1-5) 
Since we can calculate V(Wp(L — 1)) as 



1-S 



+ 5^-^-1 • VK(L - 1)) 



(45) 

(46) 

(47) 
(48) 

(49) 



(50) 



Viw;{L - 1)) = (1 - 5) • u(a^, a*, a^J + 5 • V(^;(L)), 

we have for ^ = 0, . . . , L — 1, 

VK(^)) = (1 - 5^-0 • u(a^,a*,aL,) + 5^"^ • u(ao(i), a(i)). 

With the values of the states, we can derive the conditions under which the strategy prescribed 
by this automaton is a subgame perfect equilibrium. To this end, we need to check that for any 
state w accessible from w^, f{w) is a Nash equilibrium of the normal-form game described by 
the payoff function g'^ : R^, where 

^"'(a) ^{1-5)- u{ao{w),a.) + 5 ■ Y{X{w, aoH, a)), (51) 

where ao{w) is the intervention action prescribed in f{w). 

First, we check the incentive compatibility constraints for users to follow the action profile a. 
This can be done by checking if a is a Nash equilibrium of the normal-form game at the state 
w — We{0). The normal-form game at state ty = u'e(O) has payoff function 

{1 — 5) ■ u(ao, a) + 5 ■ V(we(0)) = u(ao, a), if a = a 

■ (52) 

{1 -S) ■ u(ao, a.) + S ■ V(w* (0)), if di and a_i = a_j 

The action profile a is a Nash equiUbrium if and only if, for 3AI i e N" and for all e ^4,, 

Mj(ao,a) > (1 - 5) • Mi(ao,aj,a_i) + 5- [(1 - 5^) ■ ^^(ao, a* , aV) + 6^ ■ Mi(ao(i), a(i))] . (53) 
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Define M = maxj a^ a^i(flo, a). Then it suffices to have, for all i E J\f, 

Vi>{l-5)-M + 5-[{l-S'')-vf + S'^- vl\ . (54) 

After rearranging the terms, we have 

S{1 - 5^) • {vi - vf) + 5^+^ • {vi - > (1 - 5) • (M - Vi). (55) 

Note that for fixed L, the left hand side is strictly positive for any 6 e (0, 1), and the right hand 
side goes to when 5^1. 

Second, we check the incentive compatibility constraints for staying in the punishment phase. 
For the states Wp{0), . . . , Wp{L — 1), we check if (oq, a*, a!_j is a Nash equilibrium of the corre- 
sponding normal-form games. The normal-form game at the state w — Wp{i) for < £ < L — 1 
has the payoff function 

(l-S)- u{al a) + 5 • V(w^(0)), if a, ^ a|, a_i = aL, 

^-(a) = <^ (1 - (5) • u(aj„ a) + 5- V(w^(0)), if j i, aj a), ai = a*, = a|, VA; i,j 

(l-S)- n{al a) + 5 • \{wl{e +1)), if a = «, aLj 

The action profile (a*, a*_J is a Nash equilibrium if, for all j E J\f 

Vj{wi{e)) >{l-S)-M + 5- Vj{wi{0)), (56) 

which is equivalent to 

(1 - 5^-0 • ujial alAU) + 5^-S(ao(i),a(i)) 
> (1 _ 5) . M + 5 • [(1 - 5^) • «,(a^o, S*' ^-.O + • «.(ao(j), a(j))] ■ 
The above inequality can be further simplified as 

(1 - 5^-^ • M,(a^, alkU) + • > (1 - (5) • M + 5 • [(1 - 5^) • + 5^ • v'^ . 
After rearranging the terms, we have 

5^+1 . U _ ^j) > (1 _ 5)(M - u^{a^^, a*, aij) + 5(1 - 5'^-'-')(vJ - Uj(al a*, a!_,)) 



■i 



(57) 



which holds for a fixed L when 5 is large enough. 
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For the state iy*(L), we check if a(i) is a Nash equilibrium of the corresponding normal-form 
game. The normal-form game at the state w = Wp{i) for < £ < L — 1 has the payoff function 

{(1 - (5) • u(ao(i),a) + S ■ V(i(;^(0)), if aj ^ dj{i),a.^j = 
■ 
(1 - 5) • u(ao(0, a) + 5 ■ V(«;;(L)), if a = a(i) 

The action profile a(i) is a Nash equilibrium if, for all j E Af 

V,{w;{L)) > (1 - 5) . M + 5 • V,K(0)), (58) 

which is equivalent to 

v] > il-d)-M + d-[il-d'^)-uj{ala*,§^_j) + d'^-Uj{do{j),m)] 

= (1 - 5) • M + 5 • [(1 - 5^) • i;^ + 5^ • . (59) 

After rearranging the terms, we have 

5{1 - <5^) ■ {vj - vj) + 5^+^ • {vi - vj) > (1 - 5) ■ (M - vj). (60) 
When j i, we have 

6{1 - 5^) • (^;,- - vJ) + 5^+1 • {v] - vj) > (1 - 5) • (M - ^;,), (61) 

which holds true for any fixed L when 5 — > 1 for the same reason as (57). When j — i, we have 

{S+--- + S'')-{vj-vJ)>M-vj. (62) 

Now we have four sets of constraints (55), (57), (61), and (62) for the discount factor 5 and 
the length of punishment L. For any fixed L, (55) and (61) hold true when 5^1, because 
the left hand sides of both inequalities are strictly positive and the right hand sides of both 
inequalities go to 0. For any fixed L, (57) also holds true when 5^1, because the left hand 
side is strictly positive and the right hand side goes to 0. For (62) to hold true, we choose L 
such that L • (vj — vJ) > M — vj for all i. Then we can find a discount factor 6^ < 1 such that 
all the four sets of constraints are satisfied. 
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B. Achieving Any Feasible and Individually Rational Payoff 

We prove that any feasible and individually rational payoff v G Yj,^ that allows player-specific 
punishments {v^jfL-^ can be achieved as an SPE payoff. The difficulty is that starting from a 
certain period, the payoff on the equiUbrium path or the payoff in the player-specific punishments 
may be too low to prevent users from deviating at that period. This difficulty is resolved by [21, 
Lemma 3.7.2], which states that for any £ > 0, there exists 5' < 1 (resp. 5'* < 1) such that for 
any payoff v e (resp. v*) and any discount factor S E {5', 1) (resp. 6 E {6'\ 1)), there exists 
a sequence of pure action profiles that gives discounted average payoff v (resp. v*) and that has 
continuation payoffs within e of v (resp. v*) at any period t. In other words, the payoff starting 
from any period is approximately the same, as if generated by a pure-action profile. Combining 
with the results in the previous subsection, there exists S — max{S^,S',max4^S!^}, such that 
any discount factor S E (5, 1) can sustain v E as an SPE payoff. 

Appendix C 
Proof of Theorem 1 

We state the outline of the proof first. The proof heavily replies on the concept of self- 
generating sets. Simply put, a self-generating set, associated with a discount factor, is a set in 
which every payoff is an SPE payoff under the associated discount factor.^ Any self -generating 
set has a minimum discount factor associated with it; any discount factor that is larger than 
the minimum one can be associated with that self-generating set. The idea of the proof is to 
find the "optimal" self-generating set, the one with the smallest associated minimum discount 
factor, among all the self-generating sets that include the target payoff. In order to find such a 
self-generating set, we first derive the minimum discount factor associated with a self -generating 
set. Then we minimize the minimum discount factor associated with a self-generating set, over 
all the self-generating sets that include the target payoff. Note that for the sake of analytical 
tractability, we confine our search to a special class of self-generating sets, which is the reason 
why we obtain the upper bound of the minimum discount factor to support a target payoff, 
instead of the minimum discount factor itself. 

*We refer interesting readers to [21, Section 2.5.1] for the definition of self-generating sets. 
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A. Minimum Discount Factor to Support Self-generating Sets 

First, we calculate the minimum discount factor associated with a self-generating set, which 
can be very difficult for an arbitrary self-generating set. To obtain analytical results, we consider 
the self-generating sets of the following form: 

^M=|ve^J^:E| = i' VzeA/-||J{v-}, (63) 

where v"' is the minmax payoff profile with the ith element being vf. For any target payoff on the 
Pareto boundary of the feasible set of the protocol design problem, we can find a self-generating 
set ^ defined as above to include it. 

According to [21, Definition 2.5.3], a set ^ is self-generating if every payoff in ^ is pure- 
action decomposable on A payoff v is pure-action decomposable on if there exists a 
pure action profile (aQ,a*) and a specification of continuation promises 7 : ^ — > such that 
for alH e A/^ and for all e A,, 

Vi^(l- 5)ui(al, a*) + 5-fi(a*) > (1 - 5)ui(a*o, ai, + 5-fi(ai, alj. (64) 

By Assumption 1, the mutual minmax payoff profile v"* is a Nash equilibrium payoff profile. 
Hence, regardless of the discount factor, v'" can be pure-action decomposed by the mutual 
minmax profile (So, a) and the continuation promise 7(a) = v"' for all a. e A. 

For any payoff profile v e ^ \ v"', we first prove that v must be decomposed by a pure 
action in {(oq, a^), . . . , (oq, a^)}, and then find out the conditions on the discount factor under 
which V is pure-action decomposable on 

Lemma 1: Any payoff profile v e ^ other than v"' must be decomposed by a pure action 
in {(ao,ai),...,(ao,a^)}. 

Proof: From v = (1 — 5)u(aQ, a*) -|- 57(a*), we know that v is the convex combination of 
u(aQ,a*) and 7(a*). Since v and 7(a*) are both in W^, we have 



Vi V 
1=1 1=1 



Hence, we have X]ili(^«('^0' = 1- The pure actions that satisfy this condition must come 

from the set {(%, a^), . . . , (oq, a^)}. ■ 
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Suppose V is decomposed by (oq, a*). Let yij be the maximum payoff that user j can get by 
deviating from (ao,a*), namely 

Uij = niax Uj ( Co , % , a*_ ^ ) . (66) 
Then the constraint on pure-action decomposability can be written as 

Vj = (1 - 5)uj{ao, a.') + Sjj{a') > (1 - 5)uj{ao, a^^) + S^j{aj, a^^), Vj e J\f. (67) 

As long as we set 7i(a) = 7i(a*) for all a E A, user i will have no incentive to deviate from 
(o-o, a*). Therefore, we only need to consider j ^ i. To get the minimum discount factor, we let 
7j(a) — vJ for all a 7^ a*. Then the above constraint simpUfies to 



V 



, _ (1 _ 5) . + 57,(aO > (1 - 5)y,, + 5vJ, Vj 7^ i, (68) 
which is equivalent to 

S > Vj i. (69) 

Besides the above incentive constraint, the discount factor should ensure that there exists 
7(3.*) e ^ such that V = (1 — 5)u(ao, a*) + (^7(3.*). In other words, we need 

l^i<7i{a-')) = Vi-'^^^<Vi ^ ?^^<(^<1, (70) 

d Vi- Hi 

and 

l^j < 7i(aO) = y < ^ ^ < 5 < -, VjV ^■ (VI) 

Because > = Ylj^i |^ ^ 1^ for J 7^ ^> the above two constraints simplify to 

5 > (72) 
- A«j 

Combining (69) and (72), the minimum discount factor under which v is pure-action decom- 
posable by (aQ,a') on ^ is 

( Vi — Vi Uij — Vj 1 ^^^^ 
max < 3 , max — > . (73) 



Vi- Hi j^i Vij - V] 

For a given payoff v, we should choose an action profile from {(aQ,a^), . . . , (aQ,a^)}, so 
that the discount factor is minimized. Specifically, the minimum discount factor under which v 
is decomposed on ^ is 

{Vi — Vi Uij — Vj 1 
, max — > . (74) 
Vi - Hi Vij } 
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As a result, the minimum discount factor under which ^ is self-generating is 

( Vi — Vi Uij — Vj 1 
da — max min max < , max — > , (75) 

^ V6^^ ieN \vi-Hi' j^i Vij -vjj' 

which can be obtained analytically according to the following lemma. 

Lemma 2: The minimum discount factor under which is self-generating, defined in (75), 

is 

x \ N -1 yij- iij \ 

6^ = max < — — — — , max max > . (76) 



V 



Proof: To simplify the notation, we define a matrix X(v) e R^^^, whose diagonal elements 
are defined as 

xu{v) = ^^^, yteJ^f, (11) 

Vi - fli 

and whose off-diagonal elements are defined as 

^..(v)-^^^, yieAfJ^i. (78) 
Note that one important property of X(v) is 



xj,(v) = > y^^^ > = x.,(v), V. ^ J and Vv. (79) 

The optimization problem in (75) is equivalent to 

5,, = maxminmaxa;,i(v). (80) 

The optimal value 5^ must be strictly smaller than 1. This is because 5^j, — l only if there exists 
a V such that maXjg_Ara;ij(v) = 1 for all i E N", which is possible only if Vi — jii for all i e Af. 
But such a V is not in because Yl^=i{l^i/^i) < 1- 

Without compromising optimality, we consider the solution v* to the optimization problem 
(80) as one of the following two types: 

• Type-1 solutions: at the optimal solution v*, Xjj(v*) = 5^ for all i e Af. 

• Type-2 solutions: at the optimal solution v*, there exists i and j (i ^ j) such that Sn — 
Xij{\-*), but there exists no i such that 5^ = Xii{v*). 

The reason why we only need to consider the above two types is as follows. Assume that at 
the optimal solution v*, there exists i* e M, such that = Xj.j. (v*). Under this assumption, 
we claim that unless Xii{v*) — for all i e AA, we can always find another solution v' such 
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that imni^j^maXj^Xij{v') > mini^ maxj^j^ Xij{v*) and Xi*i*{v') > 5^. As a result, we only 
need to consider the solution v* with Xii(v*) = 5^ for all i E N" or v* with Xii{\-*) ^ 5^ for 

any i E M. 

Define the set I = {i G A/" : Xu^v*) = 5^}. In the following, we prove our claim that if X is 
nonempty and is a strict subset of J^f \ X is nonempty), we can always find another solution 
v' such that mmi^j^msiXj^ Xij{v') > miiii^j^ maxj^ Xij{v*) and Xii{v') > 5^ for i EX. 

• Suppose that there exists i' E N, such that Xi>ii{v*) < Sn. Then define a new payoff profile 
v' E with vl = V* — e ■ Vi for all i e X ^ and v-, = v*, + \I\ ■ e ■ Vii, where £ > is 
small enough such that v[ E (fJ-i.Vi) for all i E M. Since Xjii{\*) < Xiiii{\*) < 6^ for all 
j 7^ i' and maxj^f/ Xij{\-*) > 5^ for all i E M, we know that Xii/{v*) < maxj Xij{v*) for 
all i E N. Hence, increasing v*, to v'-, does not lower the maximum over each row of the 
matrix X, namely m.ayij^ Xij{V) > maxj^j^ Xij{v*) for all i E N". By construction, we 
have Xii{v') > 5^ for i eT. 

• Suppose that Xjj(v*) > 6^ for alH e jV \ X. Then pick an arbitrary i' E Af \ X, and define 
a new payoff profile v' E "W^, with v'.-^ = v* — e ■ Vi for all i El and v'-, = v*, + \X\ ■ e ■ Vii, 
where £ > is small enough such that v[ E {^i,Vi) for all i E M, and maxjgi Xjj(v') < 
v[n.ni^f^xXu{V). In this way, we still have Xn{w') — max^ Xij(v') for i E X, and have 
Xu{^') — minfcmaXjXfej(v') for i EX. However, Xjj(v') > ^^^(v*) = 5^ for i E X, which 
contradicts the assumption that v* is the optimal solution. 

Now we have proved that if X is nonempty, we only need to consider type-1 solutions. 
Otherwise (when X is empty), we consider type-2 solutions. In the following, we solve the 
optimization problem (80) by considering the solutions of the above two types. 

For a type-1 solution v^, since X — J\f, we have 

^^^^ = c, Vi G A/". (81) 

Vi - fJ'i 

Using '^fLiivj/vi) — 1, we can solve c as 

N-1 

c = ^ . (82) 

^We can always find a small enough s such that vl > in, because vl > fii, which results from the fact that Xi»i» (v*) = 5^ < 1. 
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For a type-2 solution v^, the optimal value is upper bounded by 



Vij l^'j A yi*j* f^j* /oo\ 

max max - = — —. (83) 

One v' that can possibly achieve the upper bound is 

vl = iii, Vi 7^ i* and = ^^.(l - ^ l^j/vj). (84) 

Then maxjXij(v') = 1 for all other words, min^ max^ ( v') = max^ Xj*j(v'). Since 

Xi*j*{V) = IZ'IZ^' > for all j 7^ i*, Xi*j*{V) = max^ 2;i*j-(v') if and only if 

Xi^j* (vO > Xi^i* (V) = (85) 

Since f^, > for any v G ^^j, we have ,Ti*j*(v') < c, where c is the optimal value of type-1 
solutions in (82). As a result, if Xj.j. (v') < Xj.j. (v'), the optimal value of type-2 solutions must 
be smaller than that of type-1 solutions. Then there is no need to study t5^e-2 solutions when 
Xi^j*{v') < Xi*i*{v'). When Xi*j*{v') > Xi*i*{v'), the optimal value achieved by type-2 solutions 
is 

a;i*,-.(v') = maxmax ^^^^ — —■ (86) 

Finally, the optimal value of (75) is the maximum of the optimal values of type-1 and type-2 
solutions, which is expressed as in (76). ■ 



B. Minimum Discount Factor to Support The Target Payoff 

Now we can calculate the minimum discount factor to support the target payoff v*. As we 
have discussed at the beginning of the proof, this is equivalent to find the minimum discount 
to support a self-generating set that includes the target payoff. Since we restrict our search in a 
particular class of self- generating sets the discount factor obtained in this way is an upper 
bound of the minimum discount factor. This upper bound can be written expUcitly as 

5(v*) = min5^, subject to v* e ^ \ {v^}. (87) 

According to Lemma 2, can be calculated using (76) as 

r I N -1 yij- iij \ 
On — max < , maxmax — > . 
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In the rest of this proof, we solve the above optimization problem to obtain S{v*). 
First, observe that 

max max — — — = max max — — — < 1. (88) 
Defining Wj = maxj^^ yij, we can rewrite the optimization problem (87) as 

(5(v^)= mm max<^— — -, max ^ ^ V (89) 



s.t. Vi <f^i< V*, Vi e J\f. 



Then we prove that one of the (possibly many) optimal solutions to (89) should satisfy 

= C < 1, Vj e ^^. (90) 



IV j — flj 



Wj — Vj 



First, as long as is such that vf < /j^i < v* for all i e we have 

N -1 N -1 , Wi - ua Wj -vf 

Hence, any optimal solution //* should satisfy 

Wj — a* 

^—^<l,\fjeN. (92) 

Wj — Vj 

Suppose that at the optimal solution n*, there is a nonempty set J = {j G J\f : —^3^ < 
max;gj\/ }• Then we define a new self -generating set with /x^ satisfying = 
maxjp^ !f!izi£i_ for all j e J', and with u' = u* for all j ^ J". Note that u' satisfies < u'- < u* 
for 7 e J", which implies J^^T^ , ,_ < Jt'^ , ,_ . Consequently, we have 

f - 1 Wj- fj,'A \ N -I Wj- 11* 

max < TT . max > < max < , max 

lA^-Ef=i/^'.M' .W^,.-t;jJ- (TV-^f^^;,*/^;/ 

which gives us an optimal solution jj! whose corresponding J is empty. In other words, there 

exists an optimal solution ji* such that 



1 = J^a? n, =C, Vj e TV . (93) 

Wj — vJ 



Note that maxjg_yv- <C<1 due to the constraints on ij,j. 
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Since we can write /x* as /x* — Wj — {wj — vJ) • C, the optimization problem (89) can be 
further simplified into an optimization problem with one decision variable C: 

N -I 

^ \N -Y:ui^'^^-^r)-c)iv^ 



5(v*)= min max<J^^ — ; ^_ ^ , C) (94) 



Wj — V, 

s.t. max — ^ < C < 1. 

jeJV Wj — vJ 

Since — i;7VT;773T is decreasing in C, the optimal C* should satisfy 

N -1 

M = C*, (95) 

if ^7^ , ^ , , , > maXi^Kf — ^— ^. The only solution to (95) that is smaller than 1 can 

be calculated as , where T — Y]^. (wi/vi) and 5" = y^^i (vf/vi). 

Finally, we obtain an upper bound of the minimum discount factor to sustain v* from solving 
(94): 

(5(v*) = max <( max "^1^^ 2(Ar - 1) 

Appendix D 
Proof of Theorem 2 

In the algorithm in Table I, we first determine the self-generating set under the discount 
factor ^(v*) that includes the target payoff v*. From the previous appendix, we know that the 
self-generating set ^ can be of the following form 

^.= |verj^:^| = l, Vi>Ui, yzeAf^[j{Y}. (97) 
We also know that such a self-generating set exists, and satisfies (93), namely 

Wj -Pj Wi-Pi ^ ^ ^ 

- max = C, Vj e N. (98) 



Wj — Vj i&N Wi — 

As long as we can find C, we can determine by z/j = — {wi — v^) ■ C for all i e J\f. 
According to the second part of the previous appendix, we know that C — 5{v*). Hence, we set 
Ui — Wi — {wi — Vj) ■ S{v*) in the initialization. 

Since ^ is self -generating under the discount factor ^(v*), it is self -generating under the 
given discount factor S > S{v*) according to [21, Proposition 7.3.4]. Since the target payoff v* 
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is in the self-generating set it can be decomposed by a pure action a° and a continuation 
payoff v(l) e According to the definition of self-generation, the continuation payoff v(l) 
can also be decomposed by a pure action and a continuation payoff v(2). Repeating this 
procedure, we can obtain the desired sequence of pure-action profiles {a^}^~Q. 
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Fig. 2. Payoffs in the one-shot game model and the repeated game model with and without intervention for an example 
two-user system. The dashed curve is the Pareto boundary of the set of feasible payoffs in one-shot games. The solid line 
on the upper right is the Pareto boundary of the set of feasible payoffs in repeated games. The gray area is the set of Nash 
equilibrium payoffs in one-shot games with intervention. The sets of subgame perfect equilibrium payoffs in repeated games 
with and without intervention are within the boundaries shown in the figure(the discount factor approaches 1 in the cases of 
repeated games). The dot is the Nash equilibrium payoff of the one-shot game or of the stage game of the repeated game. 



TABLE I 

Algorithm to Generate {a^}^~Q , Users' Action Profiles In The Equilibrium Outcome Path. 



Require: The target payoff v* G ff'-, and the discount factor 5 > 5(v*) 
Initialization: Set Vi = Wi — (wi — u J • (5 for all i e M , r = 0, v(0) = v*. 
Repeat 

find i* such that |«j('r) — ■ Uj{aQ,a.'' ) > Vj for all j € Af 

v(r + l) = |v(r)- Y-"(^^o-a'*) 
a = a 

T -l— T + 1 

Until v(r) = v* 
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Fig. 3. The automaton of the equihbrium strategy of the game with a mutual minmax profile (ao, a). Circles are states, where 
{we{T)}^~^ is the set of states in the equilibrium outcome path, and {wp{£)}^_^ is the set of states in the punishment phase 
for user i. The initial state is TOe(O). Solid arrows are the prescribed state transitions labeled by the action profiles leading to 
the transitions. Dashed arrows are the state transitions when deviation happens, a* is user i's best response to ao and a_i. 

TABLE II 

Performance Comparison Among Different Schemes Under Different Minimum Payoffs Guarantees 
(discount factors for repeated games shown in the parenthesis) 



Min. payoff 


Metrics 


NE [24]-[26] 


One-shot [11][15][23] 


Repeated w/o intervention 


Repeated with intervention 


7. = 1 


Sum Payoff 


39.3 


110.4 


110.2 (1.000) 


114.2 (0.987) 


Absolute Fairness 


4.0 


9.6 


16.7 (0.861) 


16.7 (0.840) 


7i = 3 


Sum Payoff 


39.3 


85.8 


108.2 (1.000) 


108.2 (0.962) 


Absolute Fairness 


4.0 


10.6 


16.7 (0.861) 


16.7 (0.840) 


7« = 7 


Sum Payoff 


N/A 


64.4 


96.2 (0.960) 


96.2 (0.910) 


Absolute Fairness 


N/A 


10.3 


16.7 (0.861) 


16.7 (0.840) 


7« = 14 


Sum Payoff 
Absolute Fairness 


N/A 
N/A 


N/A 
N/A 


75.2 (0.861) 
16.7 (0.861) 


75.2 (0.840) 
16.7 (0.840) 
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Fig. 4. Minimum discount factors to support the pure-action profile that achieves maximum sum payoff, under different 
punishment lengths and maximum intervention flow rates. = 4. The service rate is = 10 bits/s. The maximum flow rates 
for all the users are 2.5 bits/s. The trade-off factors are /3i = /32 = 2 and jS^ = Pi = 3. 



Service rate = Number of users 



Service rate = Number of users 
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Fig. 5. Performance comparison among the four schemes with the number of users increasing. Blue lines with asterisks: repeated 
games with intervention, blue lines with circles: repeated games without intervention, black lines with squares: one-shot games 
with incentive schemes, red lines with crosses: Nash equilibrium of the one-shot game. 
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Min. payoff: 0.5 

'Min. payoff: 1.7 

-Min. payoff: 4.1 



No intervention 

Intervention with max. rate 0.5 
Intervention with max. rate 1 .0 
Intervention with max. rate 2.5 
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Fig. 6. The trade-offs between the minimum discount factor and the minimum payoff guarantee under different maximum 
intervention flow rates. At the beginning of each curve, we mark the smallest minimum payoff guarantees we can impose, which 
indicates the largest feasible set of the protocol design problem with each maximum intervention flow rate. 




Fig. 7. The trade-offs between the required maximum intervention flow rate and the discount factor under different minimum 
payoff guarantees. 
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Fig. 8. The trade-offs between the required maximum intervention flow rate and the minimum payoff guarantee under different 
discount factors. 
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